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(54) Abstract Title 

Retrieving data representing a postal address from a database of postal addresses using a trie structure 

S?fc^f ra P rese, ! tin 9 a P° st ^ address (30) is retrieved from a coded address database (6) representing, in 
the form of a tree of coded postal address elements (27, 28, 29), a multiplicity of postal addresses. A dietary 
(4) formed as a trie data structure is provided, the path from the root node (10) to the leaf nodes 1 1) 
representing respective postal address elements. A processor (2) receives input data (8) comprising one or 
2E2iSJr™ Ik f ^ fi . ndi "9 a P° stal Address (30) represented in the' 'address \!SS^n^ 

entries ^S^SSl^Z ^ !*? eXa< * ly C ^ Q ^ to the search terms and for 

entries (32a 32b, 32c) allow.ng for the possibility of the input data containing one or more errors The 
processor then finds, by reference to a location index (5), the matched coded postal address elements (27a 

ZZ2Z?5t 28C) COrresp ° ndinQ l Z the entrlGS <31 ' 32a ' 32b ' 32c) in th * dictio'nary^^ 
SS^n2^T c °' res P° nd '"9 to the input terms (8a, 8b). The processor (2) then determines which of the 
matched postal address elements (27a, 27b, 28a, 28b, 28c) belong to the same address (30) and then with 

sr^ss=T* ,nde * m decode5 " ,e aa< " ess * ■""<•"' ' « 
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Method of and apparatus for retrieving data representing a 
postal address fr om a database of postal addresses 

The present invention relates to a method of and apparatus 
for retrieving data representing a postal address from a 
database representing a multiplicity of postal addresses and 
also to a computer program product executable in a processor to 
perform such a method and to a data product operable on by a 
processor to enable the method to be performed. 

In particular, the present invention relates to providing 
an apparatus, for example, a computer system that operates on a 
database representing a multiplicity of postal addresses 
enabling a user to retrieve a postal address from the database 
by providing the computer system with input data, for example a 
postal code (referred to in some countries as a zip-code) . 
Such systems are known in the prior art and facilitate 
obtaining full postal address details of a given postal address 
on the basis of partial input data leading to, for example, 
fewer keystrokes being required to be entered by a keyboard 
operator to obtain a full postal address. Furthermore, if the 
database on which the computer system operates is accurate and 
up to date (for example, if the database is provided by the 
relevant postal authorities) , a postal address can be retrieved 
that is both accurate and correctly formatted (i.e. a correctly 
laid out address in accordance with the practice of the postal 
authorities in the relevant region) . 

A common situation where such a system is especially 
useful is when a customer gives details of an address over the 
telephone. The person receiving such information can readily 
access the database of addresses and find the correct address 
on the basis of the information given orally, which being open 
to possible misinterpretation might otherwise lead to incorrect 
address details being entered. Such a system is also of 
benefit when entering address details from hand-written 
information, which may be incomplete or difficult to read. 
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one such colter system o £ the prior art is a c^uter 
pr ogrammed with the product sold ^ ^J^T^ 
PRO V3 (Version 3) . The software of that product 
for use with a database of. postal addresses in the UK^ The 
s^rch engines and the data structures used were designed 
around the British format of addresses and. in particular, the 
-fode system Present, adopted in the U. 1 n the ^ 
postcode represents, on average about 15 ^ 

the system with a P-^^^^-In full details of 
only to enter a house number in oraer c 

a unique postal address. (sorae times 
in countries other than the UK, postal 
€erred to as zip-codes) may relate to many more than 
ad resses and ma/in some countries cover more than one town. 
. L software, in particular the sear, ^^^^Z 
data-structures used in QuickAddress V3 , being 
use with UK addresses, may not therefore be the most 
appropriate for use in a computer system 
addresses relating to a country other t an he ^ 

!o „ ith r"::;:;:: :::: a 8 PR0 ^ 

Version "version 1, . The software of that product was written 
lor use with a database of postal addresses of any country 
fe not limited to the UK, . The software used to search for 

, r,f * searching method known as 

25 postal addresses makes use of a search J of 
pattern matching" . Input terms are converted xnto 
t Lee letter strings which are compared with a store of all 

h1e three letter strings together with the postal 
possible three letter 9 contained within 

a ^ resses having such three letter string 

Such a method of searching can be time consuming and may 
re a significant amount of memory to be available in which 
IZ o re the 'data relating to each possible three letter string 
aid the associated postal addresses (or parts thereof, contain 

35 SUCh ^"p-sent invention therefore seeks to provide an 
proved method of and apparatus for retrieving data . 
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representing a postal address from a database representing a 
multiplicity of postal addresses. The present invention also 
seeks to provide a computer program product executable in a 
processor to perform such an improved method and to a data 
5 product operable on by a processor to enable such a method to 
be performed. 

According to a first aspect of the present invention there 
is provided a method of retrieving data representing a postal 
address from a database representing a multiplicity of postal 

10 addresses, the method comprising the following steps: 
a) providing a first machine -readable database 
comprising data representing a multiplicity of postal 
addresses, each postal address being formed of one or more 
postal address elements, 

15 b) providing a dictionary in the form of a second 

machine- readable database, the dictionary comprising data 
representing entries, each entry corresponding to at least one 
postal address element represented by the data of the first 
database , 

2 0 c) providing a processor able to access the data stored 

in the first and second machine -readable databases, 

d) the processor receiving input data comprising one or 

more input terms for finding a postal address represented in 

the first database, 
25 e) the processor searching the dictionary for entries in 

the dictionary corresponding to the one or more input terms, 

f ) the processor ascertaining information concerning 
data in the first database representing the or each postal 
address element corresponding to the or each entry in the 

30 dictionary determined by the processor as corresponding to the 
one or more input terms, and 

g) the processor outputting data representing the or 
each postal address, if any, represented by the first database 
determined by the processor in view of the information 

35 ascertained in step f) as being in accordance with the input 
data, wherein 



the dictionary is in the form of a tree data structure, 
having a root node and terminating in a multiplicity of leaves, 
the path from the root node to a leaf being representative of 
an element of a postal address. 

Having a dictionary arranged in such a way facilitates, as 
is explained in more detail below, the retrieval of a target 
postal address (i.e. the postal address intended to be 
retrieved by the input data) without the need for relying on a 
particular format of postal code. Furthermore, having a 
dictionary arranged in such a way allows the searching to be 
carried out in a more efficient manner than the prior art 
method using three letter strings. 

It will be understood that a leaf node generally 
represents a termination point within the tree, although in 
relation the present invention a termination point need not 
necessarily be a "pure" leaf node (as is explained further 
below) . 

The databases are preferably in electronic format readable 
by a computer processor. For example, the databases may each 
be partially or wholly stored in RAM, ROM, CD-ROM, disc, tape 
or any other electronic storage media. 

The output data concerning the or each postal address 
found by the processor preferably includes data in addition to 
the data required to represent the letters that form the postal 
address elements of the full address. For example, the 
additional output data may include data representing control 
characters, formatting characters or the like. For example, 
the additional data may comprise data representative of 
carriage returns, end of string data or the like. Preferably, 
the output data is in a form that allows the data to be 
imported into a separate computer application; in such a case, 
for example, the additional output data may, for each postal 
address element of the address, include data indicating the 
category (or type) of postal address element. 

Preferably, the output data is in a form that enables a 
printer, when required, to print the address on a print medium. 



such as for example paper, a label, or the like. The method 
preferably includes a step in which the output data is 
eventually printed out as a postal address onto a print medium, 
the print medium being affixed to, or forming a part of, an 
item of post. The item of post may then be sent to the 
intended recipient via conventional postal delivery services. 

The first database of data representing a multiplicity of 
postal addresses may conveniently be formed as. a tree data 
structure having a root node and terminating in a multiplicity 
of leaves, the path from a root node to a leaf being 
representative of a postal address. Nodes closer to the root 
node may, for example, represent geographical wider regions 
than those nodes closer to the leaves. For example, for a 
database of postal addresses in the USA, the nodes in the first 
level below the root node (i.e. further away from the root) may 
represent the states forming the USA and the nodes in the next 
level may represent counties within each state. 

Each postal address element may be comprised of a line of 
the address. Each postal address element may comprise sub- 
elements, for example, separate words. For example, the words 
"NEW YORK" may form a single postal address element. 
Alternatively, one postal address element may be required to 
represent each word or sub-element of an address. 
Advantageously, there is provided data enabling the processor 
to determine whether a pair of nodes, of the first database, at 
different levels in the tree data structure relate to the same 
postal address. For- example, each node in the tree may be 
assigned an off-set value indicating the distance in memory to 
the next node at the same level, the data being so arranged 
that two nodes stored in memory relate to the same address if 
the offset value associated with the first node in memory (the 
second node being stored further ahead in the linear memory 
store) is greater than the distance in memory from the first 
node to the second node. Such an arrangement may simply be a 
linear store of the nodes, all the descendants of any given 
node being located after that node, in the linear store, and 
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b efore the next node in the linear store « . 
descendant of said given node. For example, if node A has 
fhildren nodes B and C, child node B having children nodes D 
and E and child node C having child nodes F and 0 the order of 
storage of those nodes would be A, B, D, E, C. F. . 

Preferably, the data representing each postal address 
element in the first database comprises a code for the address 
e ement. Having such codes enables the data representing , th 
addresses to taxe up less storage space. For example, certain 
words .nay occur many times within different addresses 
represented by the data. Such a word may be allocated a code. 
ch e code taxing up less space than the space required to 
renresent each of the letters of the word. Each code may be rn 
Z 72 of a pointer pointing to a location in a memory , where 
a representation of the characters forming the postal address 
element is stored. Preferably there is provided a separate 
data store, in addition to the dictionary, enabling the codes 
representing postal address elements to be decoded by reference 
repre a ae parate data store may then 

to the separate data store. mac y tly 
, be designed to allow the processor access to full 

D e o== s elements, whilst the 

formatted representations of the address ele 
dictionary may be formed without distinction to different 
String thereby facilitating a more comprehensive searching 
for address elements corresponding to an input term. 

As mentioned above, the method includes the step of 
J providing a dictionary in the form of the second machine 

readable database, the dictionary comprising data representing 
entries, each entry corresponding to at least one P-^ 
address element represented by the data of the first database. 

,0 Further advantageous or preferred feature • «™»" " 

dictionary will now be discussed. Preferably each entry 
represented by the path from each leaf of the tree to the^root, 
of the dictionary is unigue. Thus, the same address element 
represented in different parts of the first database 

35 corresponds to only one entry in the .dictionary. However, 

there may be more than one dictionary. The dictionary may also 



comprise more than one tree structure. Separate dictionaries 
or separate tree structures of a dictionary, may facilitate 
faster searching. 

Advantageously, the dictionary is so arranged that the 
nodes of the tree data structure each contain data representing 
a portion of the or each entry in the dictionary sharing the 
unique stem defined by the path from the root to that node and 
a plurality of the nodes have a plurality of such portions. 
The data structure may effectively be considered as a modified 
"trie" data structure, wherein each node effectively represents 
a single portion, but. the data representing said portion is 
held in the parent node. The nodes preferably contain data 
relating to a number of separate portions, said number being 
greater than or equal to the number of child nodes. Such a 
structure may facilitate faster searching of the dictionary, 
because the processor can discount child nodes as being 
irrelevant to the search in question without having to follow a 
pointer to such child nodes. 

Preferably, each portion in a node in the dictionary 
either acts as a termination point or has a single path leading 
from it to another node in a lower level in the tree (further 
away from the root node) . Preferably, the or each portion of 
the or each node in the dictionary data structure has no more 
than one child node. It will be understood that a node can 
include a termination point (the node effectively acting partly 
as a leaf and partly as a parent node) . A node may also 
include a plurality of termination points. The data tree may 
include pure leaf nodes (i.e. childless nodes) and mixed 
leaf /parent nodes (i.e. nodes having at least one child and at 
least one termination point) . For example, a node of the tree 
of the dictionary may represent many elements sharing the same 
stem, the stem itself being, a postal address element (for 
example, a node wherein the path from the root to the node 
represents "LONDON" , which is itself an address element, must 
also have pointers to nodes in lower levels if elements such as 
"LONDONDERRY" or "LONDON ROAD" are to be represented) . The 
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dictionary thus preferably includes data representative of end 

of string characters. 

Conveniently the last character of the portion of each 
dictionary entry represented by a termination pornt whether a 
leaf by itself or part of a parent node actrng as a leaf, is 
character or data flag representative of the end of 

Preferably, each portion represented within a node of at 

c -a ninrslitv of nodes is a single 
least some nodes of said plurality oi 

character. Per example, if a node of the tree data structure 
represents the dictionary has tore than two child nodes it 
is preferable that each respective portion of the node is a 

single character. arrange d that at least 

Preferably, the dictionary is so arranged 
see of said portions of the nodes each course a plurality 
characters. For example, in the case of a given node either 
having only one child node or being a pure leaf node, if 
g i::: node could otherwise be represented as a series o single 
child nodes it is preferable for the node to repr* .sent the 
series of nodes collapsed into a single node, so that the given 
contains all the characters as a single ^J^j£* 
oth er„ise be represented by that series of single child^ nodes. 
Ey way of example, consider the dictionary structure that may 
Z used to represent the elements "BAKER STREET,- . "BAKER 
SQUARED. "BRISTOL R0AD1" and -I^DOrt' . only, wherein the % 
, character represents the end of the address 

structure may be as follows: a first node <the root node) 
contains two portions, one containing a the other 

containing an *V ; the portion of the root node points to 
pure leaf node containing the portion , the pointer 

0 from the letter «B» in the root node points to a 

containing the portions "A" and «R" ; the poster from the 



35 



■ nfc ,- 0 a th i r< a node having a single portion 
portion points to a tnira 

, • r, a «KER S"- the pointer from «R" points to a pure leaf 

::: r— l portion ™ - ------ 

"KER 8- Points to a node having the portions »T" and «Q» , the 
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tt T" portion pointing to a leaf representing w REETH" and the U Q" 
pointing to a leaf representing "UAREH" • 

The dictionary structure may be a structure equivalent to 
a trie structure, where the letters in the nodes have 
5 effectively been promoted back towards the root to their parent 
nodes . 

Step e) is preferably performed so that each time the 
processor searches the dictionary, the processor accesses the 
data relating to any given node in the dictionary no more than 

10 once. Step e) may however be performed such that the processor 
searches the dictionary more than once as is explained below; 
in such a case the processor may access a given node in the 
dictionary more than once during step e) , but the processor 
preferably accesses the data in that node no more than once per 

15 search of the dictionary. 

Advantageously, the dictionary is stored in memory 
(whether RAM, ROM or otherwise) such that the processor is able 
to have faster access to those nodes in the dictionary that are 
most commonly accessed compared with the access times relating 

20 to nodes that are accessed less frequently. For example, the 
dictionary may be stored as a linear store of the nodes, the 
root node being stored as the first node in the linear store, 
the child nodes of the root node being stored directly 
thereafter and the child nodes of those child nodes being 

2 5 stored after the last child node of the root node, and so on. 

Storing the nodes of the dictionary in such a way also allows 
the pointers that enable the processor to find the child nodes 
of a given node to be expressed as offset values, wherein those 
offset values are generally relatively small in size (and 

3 0 therefore collectively take up less memory than they might do 

if the nodes were stored in another manner) owing to the fact 
that the children of a given node may be grouped together. 

Preferably, the dictionary may be stored such that nodes 
in a level closer to the root node are stored closer to the 
35 beginning of the linear memory store than nodes in levels 

further away from the root node. Preferably, at least the root 



node is stored in fast memory (i.e. a memory medium, such as 
RAM memory, that the processor is able to access faster than 
other types of memory media, such as for example a hard disk) . 
More preferably, at least the root node and its children nodes 
are stored in fast memory. Even more preferably, the root node 
and the nodes in a plurality of levels below the root node are 

stored in fast memory. 

Step e) may be so performed as to find if there is an 
entry in the dictionary that corresponds exactly to each input 
data term. If more than one input data term is inputted and 
all the data terms have entries in the dictionary corresponding 
thereto, and there is a single postal address that contains 
postal address elements corresponding to all of the dictionary 
entries found by the processor, then the method preferably 
outputs data relating to that single postal address. It is 
likely in such a case that said single postal address is the 
address intended to be retrieved by the input data. 

If however, there are one or more input data terms that do 
not correspond exactly to any entry in the dictionary and/or 
there are a plurality of postal addresses that the processor 
identifies as being possible matches corresponding to the input 
data, or no such postal addresses are found by the processor, 
the method may include further steps to either output data 
indicating the status of the processor's findings (for example 
to inform a user using the method that a single address could 
not be found) and/or to perform further steps in an attempt to 
improve the likelihood of retrieving the intended postal 
address. The provider of the input data initially provided (the 
provider, for example, being a user or a machine such as 
another computer system) might as a part of this process be 
prompted for further or different input data. 

Steps f) and g) of the method may include ascertaining the 
location of each occurrence of data within the first database 
corresponding to the or each entry in the dictionary, 
; determined by the processor as corresponding to the one or more 
input terms, and determining from the locations so ascertained 



-li- 
the postal address or addresses being in accordance with the 
input data. For example, the processor may simply determine 
from the locations of the occurrences the postal address or 
addresses having (or sharing) the greatest number of 
5 occurrences. 

Rather than aiming to retrieve a single address the 
invention may be used to output data containing details of a 
plurality of addresses, it being left to the user to select 
which of those addresses, if more than one are retrieved, is 

10 the target address. 

Step e) thus preferably includes the processor initially 
searching the dictionary for any entry in the dictionary 
identical to the or each input term. If during step e) no 
entry is found that is identical to any of the input terms the 

15 processor may then search the dictionary for entries having a 
lower quality correspondence with the or each input term. For 
example, on not finding any entries in the dictionary identical 
to the input data, the processor may then search allowing for 
one error at first, and if that search fails, performing a 

20 further search, allowing for two errors, and so on. A single 
error may be counted if the search term and the dictionary 
entry differ by one character being deleted, added or replaced 
with a different character. The quality of correspondence 
between two terms may be judged by calculating the 

25 "Levenshtein" edit difference between the two strings. 

As mentioned above, the searching of the dictionary is 
preferably performed so that for each search of the dictionary 
the processor need access the data represented by each node no 
more than once. Therefore, if the processor is to allow for 

3 0 one or more errors in an input term, it is preferred that the 
processor will for any given node allow for more than one of 
its child nodes as representing a possible route through the 
dictionary to a matching entry in the dictionary (allowing for 
said one or more errors) . For example, if the processor is to 

35 allow for one error, all of the child nodes of the root node 
will be of relevance, because the first letter of the input 
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term may be treated as being substitutable by a 

character, as being an erroneous added character or as being 

representative of the second character o £ a given entry in the 

i a r- i-vi^ data input term missing the first 
dictionary (i.e. trie aata 

character of the target dictionary entry, . On allowing for one 
or more errors, the nodes closest to the root node will be of 
^ch greater relevance than nodes on levels further away from 
the root node. Having the data relating to nodes closer to the 
roo t node in fast memory is therefore of great advantage when 
searching the dictionary and allowing for one or more errors 

the input terms. 

If a plurality of input terms are inputted and the 
processor finds during step e) one or more entries in the 
dictionary identical to at least one, but not all. of the -put 
terms, respectively, thereby leaving one or more unmatched 
input terms, then the processor advantageously continues 
searching the dictionary for entries having a lower quality 
correspondence with those unmatched input terms. Such a method 
assumes that if an input term matches a dictionary entry 
exactly then there is a good chance that the input term rs 
actually correct. Put another way, it may be assumed tha : 
t„ere is a relatively high probability that the target postal 
address includes a postal address element identical to the 
input term found in the dictionary, it being assumed that 
, relatively unliKely for an incorrect data term (i.. one 
containing an error, to correspond exactly to a 
element of a different and therefore incorrect postal address 
F or exa^le. if the input data includes the ter^s and 

. „ » uc .tTHROW" would be matched, but the term 

"HEATHROW", the term HEATHROW wo „._ oh 
0 ^ONPROrr would not; the processor would then proceed to search 
for dictionary entries corresponding to "LONDRON" allowing for 
one error, but the processor would not search for the term 

"HEATHROW" again. 

Similarly, the method may be such that if. on reducing the 
35 quality of correspondence required for matching an input data 
Term with a dictionary entry, further terms are matched, but 
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other terms are still left unmatched, searches of entries 
having correspondence of even less quality need only be 
conducted on those remaining unmatched terms. Such a method of 
searching may save considerable time that would otherwise be 
spent on searching for lower quality matches for terms that 
have already been found to correspond to dictionary entries 
with a relatively high quality correspondence. 

The searching of the dictionary may alternatively search 
for entries corresponding to the input data terms, the quality 
of correspondence being within a given threshold, which may be 
pre -set and may be fixed. 

The postal address elements forming a postal address may 
notionally be divided into categories. The categories may 
simply be the level in the tree of the first database in which 
the postal address element appears. The categories may be 
representative of the type of postal address element. There is 
preferably provided data enabling the processor to ascertain 
the category of a given postal address element represented by 
data in the first database. Such data may implicitly be 
provided by the structure of the first database. The method 
may thus be able to distinguish between postal address elements 
being formed of the same characters, but being of a different 
category. For example, if there were entries in the first 
database relating to both a town and a county named "ABCDEF" , 
it would be beneficial if the processor were able to 
distinguish between the two. 

In the case where a given postal address element may be 
attributed with or assigned a category, the processor is 
preferably able to be provided with input data including an 
indication of the category of postal address element that each 
of at least one of the input terms represents. The input data 
received by the processor in step d) may be processed by the 
processor before step e) is performed. Alternatively the input 
data received by the processor in step d) may be pre-processed. 

The processing of the input data may for example, be for 
the purpose of reducing the likelihood of a postal address not 
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„ A _ svntax between the 

bei ng found through differences m data syn 

- and the postal address elements representea y 

input terms and the po databases. For 

of either or both of the rirsc 
data ot eitne forme d of certain characters 

example, the dictionary may be formed ot _ 
exampxe, „ rtCt -«l address elements either 

nn1v "illeqal" characters of postal aaares 

only, mega dict ionary or being represented by 

not being represented m the dictionary 

not being f different order. For example, the 

different characters or in a different 

entry in the dictionary corresponding to a postal address 
enLLy « v ^i,ir?^ the space character, 

element including a space, may exclude the sp 

A lso the entries in the dictionary may, for ^"'^ 
represented without using = J- £^ » ^ 

:::: it::::: zv;::«z „ — - - ^ 

in step e) may then be case insensitive. 

in suey ^ rlictionary entries may be 

Entries represented in the dictiona y 

£orm ed such that information concerning the ^J^^ 

•~ ^ninripd For example, -L u ni y iA 
en address is excluded, ho stree t" Thus, when an 

represented in the dictionary as "High Street . ' 
Input term starting with a number is provided as par o the 
incut data, such data may be processed, before step e, is 
eerformed so to remove the number from the beginnrng of the 
innutTerm before searching the dictionary for corresponding 
eXrLs Once an entry has been found that matches the input 
entries, o delet ed) the processor may then ascertain 

t erm (with the number delete *> * 9 to che input term 

- whether the postal address element relating t 

Is represented by a node having nodes representing premise 
k . ,s its child nodes and whether or not any of those 
"odes r /relents the number removed from the input data. 
ctiiJ-a no« *> , addreS s element includes a 

in the case where a postal adores*. 

. , opiate to a premise number, the 

,0 number, which does not relate , J pre£erably ^presented 

corresponding entry in the dictionary f 

en entry having the relevant number moved to the end of 

, n «f P rablv not containing any data 

entry, the dictionary preferaoiy iw- 

„ im Y^r S The processing of the input data 
relating to premise numbers. me p 

i HiL a number for example a number appearing at the 
35 including a preferab i y comprises of effectively 

beginning of the term, prereraciy 
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splitting the input term into two terms, one term including the 
number at the end and the other term excluding the number. 
Thus the processor is able to match postal address elements 
with input terms containing numbers whether or not the numbers 
are representative of premise numbers. Since the two split 
terms share the same stem, the processor is able to search the 
dictionary for the two terms in parallel without needing, 
whilst considering the characters in that stem, to access any 
more nodes than when searching for only one of the two input 
terms. Treating numbers in this way therefore saves on storage 
space for data without significantly increasing processing time 
and may even reduce the average processing time required to 
search the dictionary. 

The processing of the input data may also include 
considering whether any given input term includes a set of 
characters more susceptible to errors (human error) than other 
sets of characters. The processor may be programmed to 
recognise such strings, each string being associated with one 
or more different strings with which it is commonly replaced in 
error when inputting input data. Preferably the strings 
associated with a given string relate to terms that are 
conceptually similar to the given string. For example, the 
string "STREET" or the string "LANE" might be inputted as part 
of the input data relating to a given address where the correct 
string is actually "ROAD" . The processor is preferably 
programmed to search the dictionary in a manner that accounts 
for such a string as being replaceable with another 
conceptually similar string. For example, if the input term is 
"RED LION ROAD" the processor is able to recognise that the 
string of characters "ROAD" might have been entered in error 
for the string "STREET" . 

Advantageously, the processing of the input data includes 
ascertaining whether any of the input terms correspond to a 
category of postal address element and if so including an 
indication of the category in the input data. For example, the 
processor may be programmed to recognise whether an input data 
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code, zip code or the lxl») . -d ^ 

che basis that the data term r ^ per£orming 

• ~ ^riraired anl ordered by category and thus 
th e method, -V ^ ^ _ efficient . There 

^ To^Ue. Separate dictionaries for entries relatrng 

to postal elements of ^jT^r characteristics, such 

The category may be based o ^ ^ ^present 
as for example, the number of characters 

tb. postal address element^ database3 . 
The data, in particular the fx . ^ electronic 

used when performing the method is P ^ ^ 

i e via a keyboard or other manual data entry PP 
example via a key pro vided as a visual 

The output data may may altern atively, or 

3 indication on a VDU. The confirmatio „ is 

additionally (for example, after inserted) into 

v. „ M -r-^ be electronically pasceo. vx 
Mde by the user) be el por 

a separate data storage area on application 
exanvple. the output data may be pasted 
l5 running on a computer separate daCe store. 

The input data * con3ist o£ daCa stored 

F or example, the J. har d d rive or 

in memory (whether *™ ud£ daCa relating to an 

otherwise, . The data store -^Jf 1 information. The 

3 „ existing database ^^^Tr to highlight errors 
out put data may then - ^ dacabase . 

itr: - «• — - or in r 

Tth a separate application running on a ^ 

wording to ^J^ j J^ data representing 
there is also provided a met no 
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a postal address from a database representing a multiplicity of 
postal addresses, the method comprising the following steps: 

i) providing a processor, 

ii) providing a database, accessible by the processor, of 
5 data representing a multiplicity of postal addresses, a 

dictionary of terms corresponding to those found within the 
postal addresses, and location information enabling the 
processor to ascertain the one or more postal addresses in the 
database having a term corresponding to each dictionary entry, 

10 iii) providing the processor with input data for finding a 

postal address in the database, the processor then searching 
the dictionary for entries corresponding to the input data and 
ascertaining from the location information if any postal 
address in the database corresponds sufficiently closely to the 

15 input data, and 

iv) outputting data relating to the results of step iii) . 
The database operated on during the performance of the 
method according to the second aspect of the invention may 
effectively comprise 

20 a first data structure representing a multiplicity of 

postal addresses, each postal address being formed of one or 
more postal address elements, the first data structure 
comprising respective codes representing respective postal 
addr e s s el ement s , 

25 a second data structure, in the form of a dictionary, 

comprising a multiplicity of entries, each entry corresponding 
to at least one postal address element represented by the data 
in the first data structure, 

a third data structure linking each code in the first data 

3 0 structure to data from which the postal element represented by 
the code can be directly ascertained, and 

a fourth data structure comprising data linking a given 
entry in the second data structure with each item of data in 
the first data structure representing the postal address 

3 5 element corresponding to the entry in the second data 
structure . 



18 



Alternatively, or additionally, the dictionary may be in 
the form of a tree data structure, having a root node and 
terminating in a multiplicity of leaves, the path from the root 
node to a leaf being representative of term within a postal 
address in the database. 

It will be readily appreciated by those skilled in the art that 
features of the first aspect of the present invention may be 
incorporated into the second aspect of the present invention 
and vice versa. For example, during step iii) , the processor 
advantageously initially searches the dictionary for entries 
corresponding exactly to the input data and then, if one or 
more terms included in the input data are matched but other 
terms are not, the processor preferably continues the search in 
the dictionary for entries having a lower quality 
correspondence with those unmatched terms, whilst not searching 
for further entries in the dictionary for those terms where 
entries exactly matching those terms have already been found. 

Another example of features of the first aspect of the 
present invention that may be incorporated into the second 
aspect of the present invention, are those features relating to 
the input data provided to the processor being processed or 
pre-processed before the dictionary is searched. Thus, in step 
iii) the input data may be processed (or pre-processed) by the 
processor, for example, to reduce the likelihood of a postal 
address not being found through differences in syntax between 
the input data used to searched the dictionary and the data 
representing the multiplicity of postal address of the database 

provided in step ii) . 

According to the first aspect of the invention there is 
also provided an apparatus for retrieving data representing a 
postal address from a database representing a multiplicity of 

postal addresses, 

the apparatus including a computer processor, 
the apparatus being provided with 

a first database, accessible by the processor, 
comprising data representing a multiplicity of postal 



- 19 - 

addresses, each postal address being formed of one or more 
postal address elements, and 

a dictionary in the form of a second database, 
accessible by the processor, comprising data representing 
entries, each entry corresponding to at least one postal 
address element represented by the data of the first 
database, wherein 

the dictionary is in the form of a tree data 
structure, having a root node and terminating in a 
multiplicity of leaves, the path from the root node to a 
leaf being representative of an element of a postal 
address, and 

the processor is programmed to be able 

to receive input data comprising an input term for 
finding a postal address represented in the first 
database , 

to search the dictionary for entries in the 
dictionary corresponding to an input term, 

to ascertain information concerning data in the first 
database representing the or each element corresponding to 
the or each entry in the dictionary determined by the 
processor as corresponding to an input term, and 

to output data representing the or each postal 
address, if any, represented by the first database 
determined by the processor as being in accordance with 
the "input data. 

The apparatus may, of course, be arranged to be able to 
perform a method according to the first aspect of the present 
invention. 

According to the second aspect of the invention there is 
also provided apparatus for retrieving data representing a 
postal address from a database representing a multiplicity of 
postal addresses, the apparatus including 

a computer processor and one or more databases, accessible 
by the processor, of data representing a multiplicity of postal 
addresses, a dictionary of terms found within the postal 
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addresses, and information enabiing the processor to lin* 
gl ven entry in the dictionary with the one or more postal 
addresses in the database having a term corresponds to 

diCti : n hr p ™r being programed to be able to receive input 

t „Mr-ess in the database, to search 
data for finding a postal address in tn 

the dictionary for entries corresponding to input data, to 
tscertain if any postal address in the database corresponds 
S U rcTntly closely to the input data, and to send output data 

relating to one or more ^J^^^Z to 
The apparatus may, of course, 

t „ r-o the second aspect of the present 
perform a method according to the secona 

"^"apparatus according to any aspect of the Mention 

for example, be a conventional computer system loaded with 
The appropriate software and provided with the appropriate 

data 'xne present invention yet further provides a <=°™P ut « 
program product executable in a processor to perform a method 
program p DreS ent invention as described 

, according to any aspect of the prese 

above, when provided with the appropriate data for the 

ooerate on. The computer program 
ra^dprocesso J- ^ ^ on „ 

^nl^ clrier, such as a computer, ROM. «. - ROM 
magnetic disc or tape or any other form of electronic recording 
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media . 
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. .„„ al<so crovides such a computer 
The present invention also proviu. 

• i-v> a data product, the data product 
nroaram product together with a data proau^ 

^H/a processor once programed with the 
oroduct to perform the method according to any aspect of the 
resent invention as described above. The data product may be 
fn the torn, of data stored on an electronic data carrier, such 
la a computer, ROM, RAM, C, ROM, magnetic disc or tape or any 
other form of electronic recording media. 

M,«t the postal addresses 
It will be appreciated that tne p 

, v. -v.. j,t a referred to above need not each 
represented by the data reierrc 
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represent a unique postal address in reality. For example, the 
postal address represented by the data may require the addition 
of a name of a person (an individual, or corporate body for 
example) and/or the number or name of the relevant premises. 
5 Such data may of course be manually added to the output data 
before the output data is used to mail any items to the 
intended postal address. 

According to yet another aspect of the present invention 
there is provided a data product, accessible by a computer 
10 processor, the data product including 

data representing a multiplicity of postal addresses, each 
postal address being formed of one or more postal address 
elements , 

a dictionary comprising data representing entries, each 
15 entry corresponding to at least one postal address element 

represented by data in the data product, wherein the dictionary 
is in the form of a tree data structure, having a root node and 
terminating in a multiplicity of leaves, the path from the root 
node to a leaf being representative of an element of a postal 

2 0 addr e s s , and 

data linking a given entry in the dictionary with the one 
or more postal addresses in the data product having a term 
corresponding to the dictionary entry. 

Such a data product advantageously enables a suitably 
25 programmed computer processor to search the dictionary for 

entries in the dictionary corresponding to an input term, and 
to find data in the first database representing the or each 
address element corresponding to the or each entry in the 
dictionary determined by the computer processor as 

3 0 corresponding to an input term, whereby the data product may be 

used to find a postal address represented by the data product 
in response to input data comprising one or more input terms. 

The present invention also provides a data product, 
accessible by a computer processor, the data product including 
35 a first data structure representing a multiplicity of 

postal addresses, each postal address being formed of one or 
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more postal address elements, the first data structure 
comprising respective codes representing respective postal 

address elements, 

a second data structure, in the form of a dictionary, 
comprising a multiplicity of entries, each entry corresponding 
to at least one postal address element represented by the data 
in the first data structure, 

a third data structure linking each code in the first data 
structure to data from which the postal element represented by 
the code can be directly ascertained, and 

a fourth data structure comprising data linking a given 
entry in the second data structure with each item of data in 
the first data structure representing the postal address 
element corresponding to the entry in the second data 

15 structure . 

Such a data product advantageously enables a suitably 
programmed computer processor to search the second data 
structure for entries corresponding to an input term, on 
finding an entry to find data in the first database 
representing the or each address element corresponding to the 
or each entry in the dictionary determined by the computer 
processor as corresponding to an input term, whereby the data 
product may be used to find a postal address represented by the 
data product in response to input data comprising one or more 

25 input terms. 

As has been mentioned above, provided a separate data 
store (the third data structure) , in addition to a dictionary 
(the second data structure), enabling the codes (e.g. in the 
first data structure) representing postal address elements to 
be decoded by reference to the separate data store enables that 
separate data store to be designed to allow the processor 
access to full and correctly formatted representations of the 
address elements, whilst the dictionary may be formed without 
distinction, for example, to different formatting thereby 
35 facilitating a more comprehensive searching for address 
elements corresponding to an input term. 
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Throughout the above general description of the invention, 
and below in the claims, various databases and data structures 
have been described in a way which might suggest that data is 
formed either as a unitary data collection or as a group of 
5 separate but interconnected data collections. As will be 

appreciated, there are many ways in which the present invention 
may be implemented provided that the effective underlying 
structure of the computer program product, computer software, 
and/or data is in accordance with the principles as set forth 
1 0 above . 

By way of example, an embodiment of the invention will now 
be described with reference to the accompanying drawings of 
which: 

15 Figure 1 is a schematic block diagram giving an overview 

of operation of a system according to the invention, 

Figure 2 is a schematic diagram illustrating how postal 
address data is arranged within the system, 

Figure 3 is a schematic diagram illustrating how the 

2 0 dictionary of the system is arranged, and 

Figures 4a and 4b are schematic diagrams illustrating in 
greater detail the operation of the system. 

Figure 1 shows -a system 1 comprising a processor 2 and a 
data base 3. The database 3 comprises a dictionary 4, a 
25 location index 5, a coded postal address data store 6 and a 
postal address element decoding index 7 . The processor 2 is 
able to access the data stored in the database 3, to receive 
input data 8, generally in the form of search terms relating to 
at least part of an address to be searched and to send output 

3 0 data 9, generally in the form of a full and correct postal 

address . 

The coded postal address data store 6 includes 
representations, in the form of codes, of a multiplicity of 
postal addresses, each postal address being formed of at least 
3 5 one postal address element. For example, a postal address may 
comprise a premise name element, a house number element, a 



street name element, a town element, a county element and a 
postal code element (such a postal address thus consisting of 
six postal address elements) . The actual address elements 
being represented in the coded postal address data store 6 as 
codes are able to decoded by the processor 2 with reference to 
the postal address element decoding index 7. 

The dictionary 4 comprises entries relating to each 
different postal address element occurring in the index 7. 
Each entry in the dictionary 4 may therefore correspond to many 
different entries within the coded postal address data store 6. 
The location of each entry in the coded postal address data 
store 6 corresponding to a dictionary entry can be ascertained 
by the processor 2 by reference to the location index 5. 

The operation of the system 1 may be summarised with 
reference to Figure 1 as follows. A user enters input terms 8, 
as strings of characters, which are received by the processor 
2 The processor pre-processes the input terms 8 (as will be 
explained in further detail later) and then searches the 
dictionary 4 for entries corresponding to the input terms 8. 
On finding entries in the dictionary 4 corresponding to the 
input terms 8 the processor 2 then ascertains, by means of the 
location index 5, the locations in the coded postal address 
data store 6 corresponding to the dictionary entries matching 
the input terms 8 . If the processor 2 ascertains that there is 
a single postal address represented in the coded postal address 
data store 6 with postal address elements matching all of the 
input terms 8 then the processor 2 decodes the data in data 
store 6 corresponding to the postal address by reference to the 
postal address element decoding index 7 . The results are then 
, returned to the user as output data 9. The output date 9 can, 
for example, be displayed on a VDU (not shown) and may be 
pasted into whichever application on the computer system the 
user wishes to have the address output data entered. 

For example, if the user enters the input terms "RED LION 
5 STREET" and "LONDON" , entries in the database corresponding to 
the addresses "Red Lion Street, Southampton" and "High Holborn, 
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London" will each contain only one match for the input terms 8 
entered, but the address "Red Lion Street, London" would have 
two matches and would be chosen by the processor 2 as the 
appropriate address to be returned to the user as the output 
data 9. 

Figure 2 shows schematically the arrangement of the 
dictionary 4. The dictionary 4 is arranged as a tree 
structure, having a root node 10 and terminating in a 
multiplicity of leaves 11. The path from the root node 10 to a 
leaf 11 being representative of a postal address element. The 
dictionary structure may be described as a modified trie 
structure. In a conventional trie structure each node of the 
tree represents a single character of a word, the path from the 
root to a leaf spelling out the word represented by the leaf. 
The present data structure however, has nodes comprising the 
letter or letters represented by its child nodes. The 
structure may be thought as a trie structure where the 
characters of each node have been promoted to the node above 
(the parent node) , each node thus possibly representing many 
characters (or a single string of characters - as discussed 
below) but each character being associated with only one branch 
to a lower level. Thus the root node 10 of the present data 
structure includes the initial characters of all of the entries 
in the dictionary, the nodes on the next level down each 
contain the second letters of entries in the dictionary with a 
given first letter. For example in Figure 2, node 12a pointed 
to by pointer 15a associated with the letter "B" of node 10 
contains details of the second letters of all entries in the 
dictionary starting with the letter «B" . In other words the 
tree is arranged such that nodes effectively represent a single 
letter, but the information concerning what that letter is, is 
held in the parent node together with information concerning 
other sibling nodes. 

One important exception to the nodes each representing one 
35 or more single letters is shown in Figure 2. Node lid includes 
the letters "ANYf " so that the path from the root node reads 
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-BOTANY!" • The characters at the end of the entry are coined 

intoTsin 9l e leaf, rather than having a string of single child 

^terminating in a single leaf. The data space reared to 

„ m „„ thus be reduced. Node lid is, as can 
hold the dictionary 4 may thus De reo 

be seen from Figure 2a. a leaf node but it is possible for the 
be seen * where the 

dictionary to comprise nodes that are not le 
node represents a plurality of characters representing 
ILionary entries sharing the same stem followed by thos 
characters, and possibly other characters «*•«-««• 
dictionary is arranged such that only single child nodes 
contain such a string of characters. 

Figure 3 shows schematically how data is arranged in 
coded postal address data store 6. The data 6 is stored as a 
each node representing a postal address element, and the 
, path from the root node 16 to a leaf node 21 representing 
postal address. The tree structure shown in Figure 3 is 
arranged such that the regions represented by nodes within the 
«ee Lome smaller the closer the node is to a leaf node 21. 
Hodes 17 in the level below the root node 16 represent a 
0 county, nodes 18 on the level below that representing t«n-. 

the nodes 1, below that representing street names the nodes 20 
below that representing postal codes and the leaf nodes 21 
representing house numbers or names. Rather than -pre-ting 
each character of the postal address element represented b^a 
, s Ida the nodes contain codes representative of a postal address 
element. For example, if node 18a represents a town named 
* LONDONVJAY " and the node 19b represents a street also named 
"LONDONWAY" , the contents of both nodes ISa and 1» would 
include a code representative of the word "LONDONWAY . The 
30 Processor a is able to decode the codes in the coded postai 
address store 6 by reference to the postal address element 

deC °Te nodes In the coded postal address store 6 are actually 
stored in memory (whether RAM. ROM or otherwise) as a linear 
, 5 lata store Each node in the coded postal address data store 6 
" "eludes information regarding the location in the linear data 
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store of the next node after its children and their 
descendants. The data store 6 is thus arranged linearly in 
memory (whether RAM, ROM or otherwise) , each node being 
immediately followed by its children so that children may be 
5 separated by their children, if any, but not by nodes in a 
level closer to the root node. For example, if node A has 
children nodes B and C, child node B having children nodes D 
and E and child node C having child nodes F and G the order of 
storage of those nodes would be A, B, D, E, C, F, G. The 

10 associated information with a given node regarding the location 
of the node immediately after the last of its direct 
descendants, if any, is in the form of an offset value (i.e. a 
value representative of the distance in the linear data store 
in memory between the two nodes) . Thus the processor 2 is able 

15 to determine with reference to the data store 6 as to whether 
or not two nodes relate to the same postal address, by 
calculating whether the node stored further along in the data 
store 6 is within a distance less than the offset distance 
associated with the two nodes. When input terms 8 are entered 

20 that relate to many occurrences within the coded postal address 
data store 6 the processor 2 is therefore able to ascertain 
whether there is a single address containing an occurrence 
corresponding to each term 8 entered (or which, if any, of the 
addresses have the most occurrences compared with the other 

25 addresses) . 

The searching of the dictionary 4 via the processor 2 will 
now be described, in more detail, with reference to Figure 2. 
Firstly, the data input terms 8a and 8b are pre-processed by 
the processor to convert all upper case letters into lower case 

3 0 letters, to remove non alpha-numeric characters including all 
punctuation marks (including, for example, space characters, 
apostrophes, quote marks, full stops and the like), and to 
expand abbreviations (for example, expanding "ST" at the end of 
an input term to "STREET" , expanding "RD" to "ROAD" , expanding 

35 "N" to "North", "W" to W WEST" and so on). 
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If the input term includes an ambiguous abbreviation the 
processor splits the term into two terms, one in the 
abbreviated form and one in the expanded form. Splitting a 
term into two can avoid not matching a input data string with a 
postal address element, where the postal address element 
concerned contains a letter that is not in fact an abbreviation 
(for example, there may be premises known as "The Big W<< ) - If 
the term is not ambiguous, the processor may not split the 
input term into two (for example it may be assumed that "RD.« 
is an unambiguous abbreviation for "ROAD" ) . 

Also, if the input data term starts with a number, that number 
is removed from the beginning of the data term and sent to the 
end Moving numbers to the end of data strings facilitates 
better searching of the dictionary, where numbers are also 
represented at the end of the entries. The dictionary 4 is 
also formed in such a way that all premise numbers are excluded 
from the dictionary to facilitate more efficient searching. 
Again, if the input data term includes a number the processor 
may search for entries in the dictionary corresponding either 
to the data term with the number moved to the end and also the 
data term with the number removed. 

The input data terms may also be accompanied with data 
specifying the postal address element type to be searched in 
relation to that given data term. For example, the user may 
specify that one of the data terms entered is a postal code; 
the processor during the subsequent searching then being able 
to ignore matching data of a different type. 

If the processor 2 were instructed to search for an entry 
in the dictionary identical the input term -TO*V the processor 
, would start at the root node 10, find the first letter «T» of 
»T0W pointing (pointer 15c) to node 12c where the letter "0" 
would be found, which in turn is associated with a pointer 15b 
pointing to node 13c where the letter «W« would be found, which 
is associated with a pointer I5d pointing to node 11c, a leaf 
5 node. The leaf node is, in this case., an end of string 

character («V>. because in the dictionary illustrated there 



are no other entries sharing the stem "TOW" . The leaf node lie 
points (pointer 15e) to a position in the location index 5 
where data concerning the occurrences in the coded postal 
address data store 6 corresponding to "TOW" (the entry found in 
the dictionary) is provided. There may of course be more than 
one such occurrence in the data store 6. 

If an input data term 8 does not correspond exactly with 
an entry in the dictionary 4, the processor 2 will search the 
dictionary again allowing for one error. A single error, for 
the present purpose, is counted as a substitution of a 
character, a deletion of a character or an addition of a 
character. Allowing for one such error, given the input data 
term 8 "TOW", would, in respect of the dictionary illustrated 
in Figure 2, yield the results "BOW", "STOW" and "TO", in 
addition to "TOW". m the case of "BOW" the letter "T" has 
been substituted with the letter «B» , in the case of "STOW" the 
letter "S" is added and in the case of "TO" the letter "W" has 
been deleted. (it will be noted that node 13c contains other 
characters, as well as an end of string character because there 
are, in addition to the word "TO", other words sharing the stem 
"TO" such as, for example, "TOW". The node 13c therefore 
effectively acts, in part, as a leaf node. The processor 2 if 
unsuccessful in finding a postal address corresponding to the 
input terms 8 may allow for more errors in one or more of the 
input terms. If one input term 8 is matched without error, it 
is assumed that such an input term 8 is correct, unless, that 
is, it becomes apparent to either the processor 2 or the user 
that the input term is not correct. For most situations, 
assuming that an exactly matched input term is correct saves 
time on searching for close, but not exact matches, that would 
otherwise turn out to be irrelevant. 

Figures 4a and 4b illustrate, with schematic diagrams, a 
search for an address 30 within the coded postal address data 
store 6. As shown in Figure 4a two input terms 8a, 8b are 
inputted by the processor (not shown in Figure 4a) and are then 
searched in the dictionary 4. The results from the search of 
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the dictionary 4 include a match 31 identical to term 8b but no 
matches identical to term 8a. The processor 2 then searches 
the dictionary 4 again for entries corresponding to term 8a but 
allowing for one error (any of a deletion, addition or 
substitution) - That search reveals three matches 32a, 32b and 
32c The processor then ascertains via the location index 5 
the locations of the entries in the coded postal address data 
store 6 corresponding to the matches found. As shown in Figure 
4a, two entries 27a and 27b in the data store 6 are found 
relating to dictionary entry 31 and there are three entries 
28a 28b, 28c in the data store 6, each one corresponding to 
one' of the three dictionary entries 32a, 32b and 32c. From the 
location data in the data store 6 the processor is able to 
ascertain that the entries 27a and 28c in data store 6 
corresponding to dictionary entries 31 and 32c are located 
within a group of data representing address 30. The processor 
then decides that this is the address corresponding to the 
input terms 8a and 8b. With reference to Figure 4b the 
processor then decodes the codes in the nodes 29, 28c, 27a 
D relating to the address 30 held in the data store 6 with 

reference to the postal address element decoding index 7 . The 
decoding index 7 includes entries 7a, 7b, 7c, 7d enabling the 
processor to ascertain the full postal address element 
represented by a given code. The processor is thus able to 
5 output the full and correctly formatted address 9, comprising 
address elements 9a, 9b, 9c corresponding to the nodes in the 

coded address store 6 . 

As mentioned above, it will be appreciated, that there are 
many ways in which the present invention may be implemented 
30 provided that the underlying structure of the computer program 
product, computer software, and/or data is in accordance with 
the principles as set forth above. It will also be understood 
that the invention is not limited to the embodiment described 
above with reference to the drawings, but is capable of 
35 numerous rearrangements, substitutions and modifications 
without departing from the spirit of the invention. Such 
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alternatives will be readily apparent to those skilled in the 
art and are encompassed within the spirit of the invention and 
the scope of the claims appended hereto. 

For example, the dictionary entries and the pre-processing 
performed on input terms may differ from country to country. 
For example in some countries, it is common for numbers to form 
a part of the address in addition to premise numbers and may 
need to be treated differently. In other countries, non- 
alphanumeric characters may also have greater significance than 
countries such as the UK where those characters may effectively 
be ignored when searching the dictionary. 
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Claims 



x A method of retrieving data representing a postal address 
fr om a database representing a multiplicity of postal 
5 Presses, the method comprising the following . steps: 

a) providing a first machine -readable database 
comprising data representing a multiplicity of postal 
Idlress- each postal address being formed of one or more 
postal address elements, 

b) providing a dictionary xn the form or 

, H at abase the dictionary comprising data 

machine-readable database, tne ^ nAina to a t least one 

• „ oar h pntrv corresponding to at 
representing entries, each entry 

postal address element represented by the data 

database, access the data stored 

c) providing a processor able 
15 . the first and second machine-readable databases, 

^ ^ the processor receiving input data -prising one or 
.ore input terms for finding a postal address represented 

the first database, dictionary for entries in 

e) the processor searching the dictionary 
20 Aina to the one or more input terms, 

4-v,~ riirtionary corresponding to tne one 

the tron y ^ ascertaining in£orm acion concerning 

data in Che first database representing tne or each postal 
r^ess eiement corresponding to the or ^J^Z^l the 
25 dictionary determined by the processor as corresponding 
one or more input terms, and 

g , the processor outputting data represent «*" ^ 
each Postal address, if any, represented by the frrst database 
eacn po^ information 
determined >y finance w ith the input 

30 ascertained in step f) as Demy 

data, wherein structure, 

the dictionary is in the form of a tree 
* vine a root node and terminating in a multiplicity of leaves, 
having a root no representative of 

the path from the root node to a leaf g 
35 an element of a postal address. 
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2. A method according to claim 1, wherein the first database 
of data representing a multiplicity of postal addresses is 
formed as a tree data structure having a root node and 
terminating in a multiplicity of leaves, the path from a root 

5 node to a leaf being representative of a postal address. 

3. A method according to claim 2, wherein there is provided 
data enabling the processor to determine whether a pair of 
nodes/ of the first database, at different levels in the tree 
data structure relate to the same postal address. 

10 4 - A method according to any preceding claim , wherein the 
data representing each postal address element in the first 
database comprises a code for the address element. 

5. A method according to any preceding claim , wherein the 
dictionary is so arranged that the nodes of the tree data 

15 structure each contain a portion of the or each entry in the 
dictionary sharing the stem defined by the path from the root 
to that node and a plurality of the nodes have a plurality of 
such portions. 

6. A method according to claim 5, wherein each portion either 
2 0 acts as a termination point or has a single path leading from 

it to another node . 

7. A method according to claim 5 or claim 6, wherein each 
said portion of at least some nodes of said plurality of the 
nodes is a single character. 

25 8. A method according to any of claims 5 to 7, wherein the 
dictionary is so arranged that at least some of said portions' 
of the nodes each comprise a plurality of characters. 
9. A method according to any preceding claim, wherein steps 
f ) and g) include ascertaining the location of each occurrence 

30 of data within the first database corresponding to the or each 
entry in the dictionary, determined by the processor as 
corresponding to the one or more input terms, and determining 
from the locations so ascertained the postal address or 
addresses being in accordance with the input data. 

35 10. A method according to any preceding claim , wherein step 
e) includes the processor initially searching the dictionary 
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for any entry in the dictionary identical to the or each input 



term 
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lx K method according to claim 10, wherein if during step e) 
no entry is found that is identical to any of the input terms 
Ze processor searches the dictionary for entries having a 
10 wer quality correspondence with the or each input term. 

2 . .method according to claim 10 or claim 11. wherein if a 
plurality of input terms are inputted and the processor finds 
during step e, one or more entries in the dictionary identical 
during step respectively, 
to at least one, but not all, of the inp 

thereby leaving one or more unmatched input terms, then the 
processor continues searching the dictionary for entries having 
flower quality correspondence with those unmatched input 



, method according to any preceding claim . wherein the 

postal address elements forming a postal address are -"onally 
postal au rU „„ is provided data enabling the 

divided into categories and there is previa address 
processor to ascertain the category of a given postal address 
element represented by data in the first database. 
0 ' 4 x method according to claim 13. wherein the 
0 programed to be able to be provided with input data including 
In indication of the category of postal address element that 
each of at least one of the input terms represents. 
T K method according to any preceding claim , wherein the 
, 5 input data received by the processor in step d, is processed by 
the processor before step e, is P«<«^ ssing o£ 

IS A method according to claim 15, wherein tne P 
the input data is for the purpose of reducing the livelihood of 
a costal address not being found through differences in data 
30 syTtaTbetween the input terms and the postal address elements 
presented by the data of either or both of the first and 

IT! ^"cording to claim IS or claim IS, when dependent 
on claim 1,, wherein the processing of the input dat, > includes 
35 ascertaining whether any of the input terms correspond to 



- 35 - 



category of postal address element and if so including an 
indication of the category in the input data. 
18. A method according to any preceding claim, wherein the 
input data is entered manually by a user. 
5 19. A method according to any of claims 1 to 17, wherein the 
input data is taken from a separate data store. 

20. A method of retrieving data representing a postal address 
from a database representing a multiplicity of postal 
addresses, the method comprising the following steps: 

10 i) providing a processor, 

ii) providing a database, accessible by' the processor, of 
data representing a multiplicity of postal addresses, a 
dictionary of terms corresponding to those found within the 
postal addresses, and location information enabling the 

15 processor to ascertain the one or more postal addresses in the 
database having a term corresponding to each dictionary entry, 

iii) providing the processor with input data for finding a 
postal address in the database, the processor then searching 
the dictionary for entries corresponding to the input data and 
ascertaining from the location information if any postal 
address in the database corresponds sufficiently closely to the 
input data, and 

iv) outputting data relating to the results of step iii) . 

21. A method according to claim 20, wherein the dictionary is 
in the form of a tree data structure, having a root node and 
terminating in a multiplicity of leaves, the path from the root 
node to a leaf being representative of term within a postal 
address in the database. 

22. Apparatus for retrieving data representing a postal 
address from a database representing a multiplicity of postal 
addresses , 

the apparatus including a computer processor, 
the apparatus being provided with 

a first database, accessible by the processor, 
35 comprising data representing a multiplicity of postal 
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addresses, each postal address being forced of ona or more 

poatal addraaa elements, and 

a dictionary in the fo« of a aecond database, 
accessible by the processor, comprise data representing 
entries, each entry corresponding to at least one postal 
address element represented by the data of the first 

database, wherein 

the dictionary is in the form of a tree data 
structure, having a root node and terminating m a 
multiplicity of leaves, the path from the root nod to 
leaf being representative of an element of a postal 
address, and 

the processor is programmed to be able 

to receive input data comprising an input term for 
finding a postal address represented in the first 

database, . 

to search the dictionary for entries in the 

dictionary corresponding to an input term, 

to ascertain information concerning data in the first 
.atabase representing the or each element corresponding 
the or each entry in the dictionary determined by 
processor as corresponding to an input term and 

to output data representing the or each postal 
address, if any, represented by the first database 

= « beina in accordance with 
determined by the processor as being in 

1-he input data. 
23 APParatus as claimed in claim «. the apparatus being 
arranged to be able to perform a method according to any of 

Apparatus' for retrieving data representing a postal 
address from a database representing a multiplicity of postal 
addresses, the apparatus J**^ acoessible 
a computer processor an one multiplicicy o£ post al 

Vw the processor, of data repj-^o ^ 

^dresses, a dictionary of terms found within the pos al 
addresses, and information enabling the processor to lint 
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given entry in the dictionary with the one or more postal 
addresses in the database having a term corresponding to the 
dictionary entry, 

the processor being programmed to be able to receive input 
5 data for finding a postal address in the database, to search 
the dictionary for entries corresponding to input data, to 
ascertain if any postal address in the database corresponds 
sufficiently closely to the input data, and to send output data 
relating to one or more postal addresses in the database. 
10 25. Apparatus as claimed in claim 24, arranged to be able to 
perform a method according to claim 21. 

26. Computer program product executable in a processor to 
perform a method as claimed in any of claims 1 to 21. 

27. Computer program product as claimed in claim 26 together 
with a data product, the data product enabling a processor once 
programmed with the computer program product to perform the 
method of any of claims 1 to 21 

28. Data product, accessible by a computer processor, the data 
product including 

data representing a multiplicity of postal addresses, each 
postal address being formed of one or more postal address 
elements , 

a dictionary comprising data representing entries, each 
entry corresponding to at least one postal address element 
represented by data in the data product, wherein the dictionary 
is in the form of a tree data structure, having a root node and 
terminating in a multiplicity of leaves, the path from the root 
node to a leaf being representative of an element of a postal 
address, and 

data linking a given entry in the dictionary with the one 
or more postal addresses in the data product having a term 
corresponding to the dictionary entry. 

29. Data product, accessible by a computer processor, the data 
product including 

a first data structure representing a multiplicity of 
postal addresses, each postal address being formed of one or 
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mo re postal address elements, the first data 

comprising respective codes representing respects postal 

address elements, dictionary, 
a second data structure, in the form ot a a 

• • a a multiplicity of entries, each entry corresponding 
comprising a multiplicity « 

to at least one postal address element represented by the 

in the first data structure, 

a third data structure lining each code in the frrst data 

structure to data from which the postal element represented by 

the code can be directly ascertained, and 

a fourth data structure uprising data linking . grven 
entry in the second data structure with each item of data „ 
the first data structure representing the postal address 
element corresponding to the entry in the second date 
structure . 
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